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Abstract 

Dynamic networks models describe a growing number of important 
scientific processes, from cell biology and epidemiology to sociology and 
finance. There are many aspects of dynamical networks that require sta- 
tistical considerations. In this paper we focus on determining network 
structure. Estimating dynamic networks is a difficult task since the num- 
ber of components involved in the system is very large. As a result, the 
number of parameters to be estimated is bigger than the number of ob- 
servations. However, a characteristic of many networks is that they are 
sparse. For example, the molecular structure of genes make interactions 
with other components a highly-structured and therefore sparse process. 

Penalized Gaussian graphical models have been used to estimate sparse 
networks. However, the literature has focussed on static networks, which 
lack specific temporal constraints. We propose a structured Gaussian dy- 
namical graphical model, where structures can consist of specific time dy- 
namics, known presence or absence of links and block equality constraints 
on the parameters. Thus, the number of parameters to be estimated is 
reduced and accuracy of the estimates, including the identification of the 
network, can be tuned up. Here, we show that the constrained optimiza- 
tion problem can be solved by taking advantage of an efficient solver, 
logdetPPA, developed in convex optimization. Moreover, model selection 
methods for checking the sensitivity of the inferred networks are described. 
Finally, synthetic and real data illustrate the proposed methodologies. 



1 Introduction 



Graphical models are powerful tools for analyzing relationships between a large 
number of random variables. Their fundamental importance and universal ap- 
plicability are due to two main factors. Firstly, graphs can be used to represent 
complex dependence relationships among random variables. Secondly, their 
structure is modular which means that complex graphs can be built from many 
simpler graphs. 

A graph consists of a set of nodes and a set of links between these nodes. 
In a graphical model the nodes are associated with random variables and links 
represent conditional dependence between the nodes. In a Gaussian graphical 
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model (GGM) it is assumed that these random variables follow a multivariate 
normal distribution. An important property of GGMs is that the concentration 
matrix, i.e. the inverse of the covariance matrix, fully determines represents the 
conditional independence structure, i.e. the graph. 

The maximum likelihood estimator (MLE) for the concentration matrix is 
the inverse of the sample covariance matrix, if it exists, and it exits when the 
number of observations is greater than the number of random variables. How- 
ever, in many modern applications experiments consists of observing many fea- 
tures and many fewer observations. In genetics, variables are typically thousands 
of genes or gene products, whereas there are at best only a few hundred of ob- 
servations. Moreover, genetic networks are sparse, which means many of the 
genes behave conditionally independent from the others. In terms of GGMs, 
it means that mos t of the elemen t s in t he precision matrix are equal to zero 
( Jeong et ai . 2004 ). As iDempstei (1972) pointed out, parameter reduction in- 
volves a trade-off between costs and benefits, i.e. the amount of noise in a fitted 
model due to errors of estimation is reduced but err ors of miss - specifi cation 



are introduced because the null values are incorrect. iDempsteij (|1972[ ) intro- 
duced sparse covariance matrix estimation. The subset of zeros in the precision 
matrix can be selected in various ways analogous to the various forward and 
backward procedures used for selecting predictors variables in multiple regres- 
sion analysis. However, the forward and backward procedures are not suitable 
for high-dimensional data problems. 

£i-penalized likelihood inference (jTibshirani Il996f ). which has been exten- 
sively applied in regression models, can be adapted to graphical models in order 
to estimate sparse graphs. The idea is to constrain the £i-norm penalty, i.e. the 
sum of the absolute values of the inverse of the covariance matrix, to be less 
or equal to a tuning parameter. The smaller the tuning parameter, the more 



zeroes will be estimated in the precision matrix. iMeinshausen and Biihlmann 
( 20061) proposed to select edges for each node in the graph by regressing the vari- 
able on all other variables using £i penalized regression. This method reduces 
to solving p separate regression problems, and does not provide an estimate 
of the matrix itself. Penalized maximum likel i hood a pproaches using the £ i 



penalty have been considered by Yuan and Lin ( 2007 ): Baneriee et al. ( 20081 ): 



Friedman et al. ( 2008 ): Rothman et all ( 2008[ ). who have all propos ed different 



algori thms for computing the estimator of the precision matrix. iFan and Lil 
(|200ll ) introduced cl i pped absolute deviation penalty (SCAD). On the other 
hand, iLam and Fall (|2009[ ) extended this penalized maximum likelihood ap- 
proach to general non-convex penalties. Alternative penalized estimators based 
on the pse u do-lik elihood instead of the likelihood were recently proposed by 



Peng et al. ( 2009() . Theoretical properties of the £i penalized maximum likeli 



hood e stimator in the large p scena ri o wer e d erived by Meinshausen and Biihlmannl 



(200i) as well as iRothman et al 



Lam and Fan 



(l2009l) 



established a 

property of the penalized likelihood estimator, which 



(120081 ). 

so called "sparsistency' 
means that all parameters that are zeroes are estimated as zero with probability 
tending to one when the sample size increases. 

The complexity of the GGMs can be reduced if one imposes some symmetry 
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constraints on the precision matrix, i.e. the number of para meters to be esti- 

mated is reduced by introducing equahty constraints. Recently. iHoisgaard and Lauritzeii 
(200i) ])roposed Gaussian graphical models with symmetry constraints. An im- 
portant motivation for considering structured GGMs is that conditions for max- 
im um likelihood estimates to exist are less restrictive than for standard GGMs. 
As lH0isgaard and Lauritzen ( 2008 ) pointed out symmetry models are particu- 
larly useful when parsimony is needed, for example when estimating precision 
matrices of large dimension with relatively few observations. 

The idea of this paper is to combine symmetry models and graphical lasso for 
modelling dynamic graphical models. In particular, we propose two models that 
impose symmetries constraints on the precision matrix and on the conditional 
correlation matrix i7, which is the negative scaled concentration matrix. The 
latter model is useful when the random variables are not measured on the same 
scale. We structure graphical models in a way that comes naturally to time- 
course data, by using the idea of coloured graphs to define factorially-structured 
precision matrices. 

In Section [5J we introduce two motivating examples: a time-course microar- 
ray experiment involving T-cells and an educational experiment involving the 
Edu dataset. In Section [31 we introduce the idea of coloured graphs for time- 
course datasets and Gaussian graphical models. The factorial graphical lasso 
for both precision, FGLe, and conditional correlation matrices, FGLq, are dis- 
cussed in detail and the convex optimization problem with linear constraints 
is explained in section |4l Section [5] addresses model selection and parameter 
smoothing selection which are important issues in graphical lasso. In particu- 
lar, classical approache s such as AIC, BIG are derived to do model selection, 
and stability selection ( Meinshausen and Biihlmaiml . 2010[ ) has been adapted 
to factorial graphical lasso models. Section IH] provides encouraging numerical 
results in a simulation study. In Section [3 T-cell and Edu datasets are studied 
in full detail. 



2 Motivating examples 

An important issue in system biology is to understand the system of interactions 
among several biological components such as protein-protein interaction and 
gene regulatory networks. Hence, several techniques have been developed to 
collect data from different organisms. For instance, microarrays measure gene 
expression levels, i.e. the concentration of messenger RNA produced when the 
gene is transcribed. 

A single microarrays is a snapshot in time of the expression of genes of an 
organism. Gene expression, therefore, is a temporal process, which evolves dy- 
namically in response to internal, genomic, and external, environmental, cues. 
Even under stable conditions, mRNA is transcribed continuously and new pro- 
teins are generated. This process is highly regulated. In many cases, the expres- 
sion programme starts by activating a few transcription factors, which in turn 
activate many other genes that act in response to the new state. Transcription 
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factors are proteins that bind to specific DNA sequences, thereby controlhng 
the flow, i.e. transcription, of genetic information from DNA to mRNA. Tak- 
ing a snapshot of the expression profile following a new condition can reveal 
some of the genes that are specifically expressed in the new state. But rather 
than determining the set of differentially expressed genes, such as in the early 
days of microarray experiments, biologists are more interested in determining 
the transcriptional programme, i.e. determining the functional pathways. In 
order to infer the temporal interaction between the genes, it is necessary to 
perform time-course expres sion experiments. The time-course genetic T-cell 



dataset (jRangel et al\ . 12004) is described in subsection 12.11 

Many educational studies aim to establish the effect of one or more vari- 
ables on one or more classroom outcomes. Often such observational studies 
collect a large number of variables on a large number of disparate scales. Of- 
ten various proxy measures are available for these variables and therefore it is 
not uncommon to expect a complex set of relationships between the variables. 
Our second motivating example i s a longitudinal educa.tiona l study described in 



Opdenakker and Maulanal (|2012l ): lOpdenakker et al\ (|201lh . We describe this 



example in subsection 12.21 The aim is to find a network that describes the 
relations among the items measured in the experiment. This second example 
clarifies the difference between the use of FGLq and FGLq. 

2.1 Example 1: Human T-cell microarray data 

Two cDNA microarray experiments were performed to collect gene expression 
levels for T-cell activation analysis. Activation of T-cell was produced by stim- 
ulating the cells with two treatments: the calcium ionosphere and the PKC 
activator phorbol ester PMA. The human T-cells coming from the cellular line 
Jakart were cultured in a laboratory. When the culture reached a consistency 
of 10^ cells/ml, the cells were treated with the two treatments PMA and PKC. 
Gene expression levels for 88 genes were collected for the following times after 
the treatments: 0, 2, 4, 6, 8, 18, 24, 32, 48, 72 hours. In the first experiment 
the microarray was dived such that 34 subarray were obtained. Each of these 
34 subarray contained the strands of the 88 genes under investigation. Strands 
are the complementary base for the rRNA which is the transcribed copy of a 
single strand of DNA after the process of transcription. In the second microar- 
ray experiment the microarray was dived into 10 sub-arrays. Each of these 
10 sub-arrays contained the strands of the 88 genes under investigation. Each 
microarray was composed by ten different slides which were used for the two 
experiments to collect temporal measurements. For time a set of cells were 
hybridized to the first slide after the cell cultured reached the right density and 
before the treatments were applied. For the second time point (time 2), another 
set of cells were hyb ridized in a second slide. The experiment was conducted by 



(|Rangel et aLl . 120041 ). 

At this point we assume that the technical replicates are independent sam- 
ples, and that the temporal replicates are dependent replicates from the same 
samples. These two assumptions result in a dataset with 44 independent repli- 



4 



cates. These are strong assumptions, but can be justified from the underly- 
ing samphng scheme. Nevertheless, this means that the conclusions from the 
analysis on T-cell, shown in sec tion [3 should be cr itically considered. Two 



further steps were conducted by iRangel et Ml (|200l to obtain a set of genes 



that were highly expressed and normalized across the two microarrays. Firstly, 
thirty genes with high variability between the two microarrays and within the 
same time point were removed. Secondly, normalization methods were applied 
to remove systematic variation du e to ex perimental artifacts. The normaliza- 



tion method used b y iRangel et al\ (|2004l ) is described in the paper written by 



Bolstad et a?.l(|2003^ . This results in a 44 i.i.d. 10 x 58 dimensional observations. 

In this subsection, we describe a possible partition for the maximum likeli- 
hood estimator of the precision matrix that is the inverse of the sample variance- 
covariance matrix. For illustrative purposes, we select a subset of 4 genes 
r = {ZFN, CGN, SIV, SCY} across 2 time points T = {1, 2}. Then, we com- 
pute and show the empirical concentration matrix for the selected subset 
based on 44 observations in Table [T] In particular, can be partitioned as 
follows, 

Drr \ ^-i^f Drr A «-i 

where o is the elementwise Hadamard matrix product, IpT, Irr are identity 
matrices, and Drr is a square matrix with ones off the diagonal. 

At this point we notice that four terms are showed in the summation and each 
of these terms can be interpreted. The first term indicates how well the variance 
of (geneij)i£r.jeT is predicted given the rest. For example it is easier to predict 
the behaviour of gene SIV than the behaviour f gene SCY. The second shows 
self-self conditional independence at temporal lag 1. The self-self conditional 
independence represent relationships between the same genes measured into 
different time points. For example, gene ZNF is conditional dependent with 
itself given the rest of the genes and the dependence seems to be very strong 
—0.93. So, if gene ZNF is upregulated at time then it will be upregulated at 
time 1. The third term indicates the conditional dependencies at time and 1 
between genes at temporal lag zero. For the t-cell experiment this means that 
one is considering the relations between genes before the effect of the treatment. 
For example, gene CCN and SIV can be considered conditionally independent 
(0.02 and 0.06) since for both the temporal lag the element of the precision 
matrix is small. The last term indicates the conditional dependencies between 
time and 1 at lag zero. In Section [3] we will refer to these partitions of the 
concentration matrix as natural partitions. Moreover, we will use the idea of 
coloured graphs to make a different partition for the different temporal lags. 
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Table 1: Maximum likelihood estimator of the precision matrix or empirical 
concentration matrix S^^ based on 44 replicates for 4 genes measured across 2 
time points. The number of genes measured in the T-cell experiment was 58 
across 10 time points but we randomly selected four genes to give an illustration 
of the estimated precision matrix. The lower part of the matrix can be read 
from the upper part since the matrix is symmetric. 



Time 1 

Gene ZNF CCN SIV SCY 


2 

ZNF CCN SIV SCY 


ZNF 1.38 -0.05 -0.50 0.25 
CCN - 1.58 0.02 -0.39 
SIV - - 1.61 -0.13 
SCY - - - 1.29 


-0.20 -0.12 -0.01 -0.11 
-0.12 -0.93 -0.02 0.07 
-0.17 0.12 -0.78 -0.02 
0.05 0.25 0.49 -0.09 


ZNF _ _ _ _ 
CCN _ _ _ _ 
SIV - 

SCY _ _ _ _ 


1.10 -0.18 0.04 0.08 
- 1.78 0.06 0.42 
- 1.58 0.09 
1.16 



2.2 Example 2: Educational study. 

The education dataset is a longitudinal dataset in which a set of scores regarding 
teacher and student behaviour were collected across 5 different time points, 
to wit 0,1,4,7,10 months. The study was conducted in the Netherlands and 
the students were foUowed - up during their first yea r of th e secondary school 



(jOpdenakker and Maulanal l2012t lOpdenakker et al\ . 120111 ). In particular, 20 



classes of students and 24 educational scores were considered. 

For illustrative purposes, we consider a subset of four scores by naming 
the variables: Controlled (Contr), Autonomies (Auton), Influence (Infl) and 
Proximity (Prox). The first two scores concern student behaviour, taking values 
between -3 and 3. The last two scores measure teacher behaviour and take values 
between and 5. The scale for these variables is not compatible and conditional 
correlations are therefore more meaningful than the concentrations. In section 
[3l we show that conditional correlations are invariant under changes of scale for 
individual variables. The empirical conditional correlations based on 20 classes 
of students for 4 scores measured across 2 time points are shown in the upper 
triangular block (italic numbers) of the matrix in Table [2] while the conditional 
covariances are shown in the lower triangular block. 

Similarly the example in the preview Subsection 12.11 we can partition the 
empirical conditional correlation matrix in 4 parts as follows: 

were BT^ is equal to the negative of the scaled precision matrix. This means 
that the conditional elements on the lower part have been divided by the square 
root of the two corresponding elements on the diagonal r^j = yj^a^n- At 
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Table 2: Empirical conditional correlations (upper triangular and diagonal) and 
conditional covarianco (lower triangular) based on 20 replicates for 4 score mea- 
sures measured across 2 time points. The number of educational measures in 
the experiment was 24 across 4 time points but we randomly selected four mea- 
sures to give an illustration of the estimated precision matrix and the estimated 
scaled precision matrix. 

Time 1 2 





Subject 


Contr 


Auton 


Infl 


Prox 


Contr 


Auton 


Infl 


Prox 




Contr 


1.79 


0.31 


-0.07 





0.61 


-0.23 


0.18 


-0.06 


1 


Auton 


-0.57 


1.86 


0.03 


0.22 


-0.19 


0.57 


-0.07 


-0.03 


Infl 


0.11 


-0.05 


1.25 


-0.04 





-0.01 


0.41 


0.09 




Prox 


0.01 


-0.37 


0.05 


1.55 


0.04 


-0.03 


-0.12 


0.51 




Contr2 


-1.09 


0.35 





-0.07 


1.76 


0.31 


-0.08 


-0.01 


2 


Auton 


0.43 


-1.09 


0.02 


0.05 


-0.58 


1.96 


0.13 


0.21 


Infl 


-0.28 


0.11 


-0.53 


0.18 


0.12 


-0.21 


1.37 


0.21 




Prox 


0.10 


0.06 


-0.13 


-0.82 


0.02 


-0.39 


-0.32 


1.68 



this point each of diagonal elements is in the range — 1 and 1 and it corresponds 
to a conditional correlation measure. The interpretation of these measures is 
equivalent to the interpretation of the elements of the precision matrix. How- 
ever, conditional correlations are pure numbers, i.e. the scale of the variables do 
not have any influence, and they are in a range between —1 and 1 so that num- 
bers close to zero indicates conditional independence, as before, and numbers 
close to one or -1 indicate strong positive and negative conditional dependen- 
cies, respectively. Whereas of the conditional covariance elements a positive 
conditional correlation indicate a positive relations between the corresponding 
genes. 

3 Factorial graphical model for dynamic net- 
works 

An undirected graph G = {V, E), where y is a set of vertices corresponds to a set 
of random variables, and E the set of links corresponds to a set of conditional 
independencies, is defined a Gaussian graphical model if the set of random 
variables follow a Gaussian distribution. Vertices of the graph G in static dataset 
have no natural order. Whereas for longitudinal dataset or time-course dataset 
a natural order can be established. For example, gene ZNF in the T-cell dataset 
is measured at time 1 (which is gene expression are collected after hour of 
treatment) and at time 2 (which is after two hours of treatment). Vertexes 
ZNF, CCN, SIV and SCY are called natural vertexes. Generally speaking, let 
G = {V,E) be a graph where V = {vjt)j^r,teT is a finite set of vertices and 
E CV X V is a. subset of ordered pairs of distinct vertices. The set of natural 



7 




Figure 1: Example of a dynamic coloured graph with four natural vertices 
measured across two time points. The natural partitions of equation[T]have been 
represented with different type of the lines. The first term is not represented, 
The second term is represented by dot lines. The third term is represented by 
dashed and continuous lines. 



vertices is F = {1, . . . , nr}, where T — {1, . . . , nx} is an ordered set, typically 
describing time points. Note that the set F is unlabelled, but here described as 
an ordered set for convenience of notation. Moreover, we have seen in example 
1 and 2 that the empirical correlation matrix or the empirical concentration 
matrix can be written as a sum of four terms. These four terms are called 
natural partitions, and they can be represented in a dynamic graph as shown 
in Figure [TJ Here, we assume that the last term of formula [T] is zero so that 
genes at time 1 and at time 2 are assumed conditionally independent. The first 
term of formula[T]is not represented in Figure[T] The second term is represented 
by the dot lines and it represents the self-self interactions. The third term is 
represented by the dashed and continuous lines. 

We will refer to the graph in Figure [1] as dynamic graphs, which can be 
formally defined as follows: 

Definition 3.1 (Dynamic graphs) A dynamic graph is a pair G = {V,E), 
where V = {fy jjgp ^g-p is a finite set of vertices and E C V x V is a set of 
ordered couples of elements. F and T are finite sets. 

Hence, the main characteristic of dynamic graphs is that the same vertices are 
measured across different time points. In the T-cell example the same 56 genes 
were measured across 10 time points and between time point t and t -\- 1 the 
experimental conditions were changed. We notice that conditional correlations 
between the natural vertices at time point one and time point 2 described in 
Table [2] are similar. On the end the diagonal elements in the first block of Table 
[1] and in the second block are similar. Moreover, it is intuitive to think that 
the conditional correlation between gene 1 at time 1 and gene 1 at time 2 can 
be considered equivalent or that self-self correlations between genes across two 
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time points can be consider to be equal. In order to formalize this idea we use 
of coloured graphs. 

Definition 3.2 (Coloured Graph) A coloured graph G = {V,E,F) is a triplet, 
where G = {V, E) is a graph and F is a mapping on the links, i.e. 

F:E^G, 

where C is a finite set of colours. 

Thus, a coloured graph induces partitions on the graph by F, that is different 
subsets of E are visualized with different colours. For example, three different 
colours (i.e, three different type of lines) are represented in Figure [TJ The dash 
line represents one colour (or sub-partition), the continuous line represents the 
second colour and the dot line represents the third colour. Let us denote the 
partitions induced by the coloured graphs F hy E ~< F, i.e.: 

E^F^UceC'F^^nE, 

where E < F indicates that the partition is induced on E, and E stands for the 
complete set of links. The mapping can be applied to the subsets of E which 
are St and Ni. Each partition represents relationships between natural vertices 
at some time point t € T. 

Let {Si}"^Q^, and {Ni}^^^^^ be subsets of vertices and hnks where Si,Ni 
are natural partitions such that: 

St = {{{vjt,v-j^t+t), {vj^t+t,Vjt)}\j eT,t^l,...,nT-i}, 

and 

= {{ivjt,Vk,t+t), {vk,t+i,Vjt)}\yj 7^ A: e r,< = 1, . . . ,nT - 

Each of these partitions is interpreted as follows: Si considers the lag i interac- 
tions between the same natural vertices, and Ni is a graph at time lag i. We 
induce further partitions on Si and Ni by using the idea of coloured graphs in 
order to give more consistent interpretations of dynamic graphs. In particu- 
lar, we consider four mappings. Firstly, E ^ Fi indicates that all edges in the 
partition are coloured with the same colour. Secondly, E -< Ft indicates that 
all edges in the partition are coloured with colours which are the same within 
natural vertices. Thirdly, E < F^ indicates that all edges in the partition are 
coloured with the same colour within time points and different colours across 
natural vertices. Finally, E -< F^t indicates that all edges in the partition 
are coloured differently across time points and natural vertices. This can be 
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Table 3: Number of colours for any combination of Si, Ni and graph colouring. 
Si,Ni represent natural partitions of E, where E is a set of links. The natural 
partitions are sub partitions of the set E. 



Factor 5*0 


No S, 


Ni 


^2 


N2 




El 1 


1 1 


1 


1 


1 




Et riT 


TiT riT — 1 


riT — 1 


riT - 2 


riT — 


2 


Er nr 


inr(nr - 1) n-r 


nr(nr - 1) 


nr 


7^^(f^^ 


-1) 


ErT ^^^?^T 


inr(nr — 1)?it nr{nT — 1) 


nr(nr - 1) 


nrifiT - 2) 


?^^(^^^ - 


-1) 






X {riT — 1) 




X (riT - 


-2) 



summarize with the following functions: 

Fi : E^C 

yv,,Vj£E: Ei{v,) = Eiivj) 
Et : E^C 

yvit,js,Vku,iv £ E.iit^u and s = v. 

FT{Vit,js) = ET{vku,lv) 

Et : E^C 

yvitjs,Vku,iv e E,\ii^k and j ^ I : 
-Fr(witjs) = ET{vku,iv) 
Ett '■ E^C 

no restriction 

where we can substitute aX E b. specific natural partition Si or Ni for i = 
1, . . . , T - 1. Thus, Si -< Ei and Ni -< Ej, where i,j = Fi, Ft, Et, Ett induce 
partitions as follows: 

Oi — \Di 

and 

N^ = {^r}™=i- 

For example. Si -< Et induces the following partition of Si: 

S, = {Slu...U 

where Sj = {{(wjt, i'i,t+i), (wi,t+i, Vjt)}\j G F, t 1, . . . , - «}. Edges belong- 
ing to 5** are coloured with the fiT — i colours C — {Ci, . . . , Cm^-i}, respectively. 
We abuse notation and let Ni -< and S'i ^ in order to express that Ni = {0} 
and Si = {0}, i.e. the graph G does not contain such edges. 

In Table [3] we show the number of colours for any combination of Si, Ni and 
graph colouring. The total number of colours nc can be calculate from Table 
[3l Figure [T] shows an example of "coloured graph" where vertices {vij)i^T,jeT 
are all of the same colour and edges with the same line styles are of the same 
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colours. The graph resembles the following model: 

[So ^FuNo^ Ft, S,^F,,N-,^ 0], (2) 
where, firstly the following natural partitions (sub-partitions) are created: 

50 = {Vli,Vi2,...,VpT}, 

51 = {(Wll,t^l2), (^^21,^^22), (W31,W32), (W41,W42)}, 

-^0 = {(Wll,^^2l), (t^21,^^4l), («41,f3l), («31,fll), {vu,V4l), (W21,^^3l), 

= (W12,W22), (^^22,^^42), (^^42,^^32), (W32,'yi2), (W12,W42), (^^22,^^32)}- 

Note that if a couple {vij,Vki) is present then the couple {vij,Vki) is present too. 
We have omitted the symmetric couples from the sets to simplify the notation. 
Secondly, 5*0 ~< Fi induces the following subset: 

5*0 = {vil,Vi2,...,VpT}, 

so that a colour is created. Then, A^o -< Ft brings the following two sub- 
partitions: 

-^0 = {(l'll)^^2l), (w21,W4l), (W41,'f3l): ("31 , ''''11 ) , (wil,W4l), (w21,W3l)}, 
-^0 = {(^'12,?^22), (^'22,'y42), (^'42,t'32), (W32,'yi2), (wi2,W42), (W22,'y32)}, 

so that two colours have been created. 

In order to consider a graphical model nodes of the dynamic graph need 

to be associated with the random variables. Then, we can assume that nodes 
follow a multivariate normal distribution. In what follows, wc consider the set 
of vcirticcs and nodes of the dynamic coloured graph. Moreover, wc denote 
with Y = {Yy^.)y..^v G K'""^ the set of random variables. Each vertices V = 
(vij)ier,jeT is related to a random variable Y,,..^. 

Definition 3.3 (Dynamic graphical models) A graphical model M = (G, P) 
is a couple G and P, where G is a dynamic graph and F is a probability dis- 
tribution on Y satisfying som,e Markovian properties, i.e. set of conditional 

independence relations encoded by the undirected edges E. 

In order to connect the idea of coloured graphs and graphical models we give 
the following definition: 

Definition 3.4 (Factorial dynamic graphical models) A factorial graphi- 
cal model M = [G, P, F) is a triplet (G, P, F), where G is a dynamic graph, P is 
a probability distribution and F is a mapping on the natural partitions Si and 
Ni, for i = 1, . . . , T — 1 and T is the total number of time points. 

Assume that Y ~ S) follows a multivariate normal distribution then we 

have a factorial Gaussian graphical models wh(;rc pairwisc conditional indepen- 
dencies are equivalent to zeros in the concentration matrix = i.e: 

Yvij -L ^Vki\^V\{vij,Vki} ^{ij,kl} = 0- 
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The scaled off diagonal elements i^ij.ki\v\{ij,ki} 



are the negative 



of conditional correlation coefficients, where i,k G T and j,l G T for i j and 
k ^ I. For future use, we denote $7 = {i^ij^ki\v\{ijM]) be a matrix of scaled 
elements of 0. 

For example, the factorial Graphical model with graph represented in Fig- 
ure [U Y = (Yii, Yi2, . . . , I42) vector of random variables which correspond to 
vertices {wn, W12, ■ • ■ , W42} and relations [5*0 -< Fi,No ~< Ft, Si ^ Fi,Ni ^ 0] 
given in ^ imply the following precision matrix: 



= 



" 01 


02 


02 


03 








02 


01 


02 





03 





02 


02 


01 








03 


03 








01 


04 


04 





03 





04 


01 


04 








03 


04 


04 


01 



We have described the idea of coloured graphs which allows us to create sub- 
partitions of natural partitions. Moreover, equality constraints on conditional 
correlations can be imposed such that edges with the same colours imply these 
equality restrictions. Here, we define a set of design matrices X which is useful to 
directly connect elements of a coloured graph with concentration or conditional 
correlation matrix parameters. 

A coloured graph that defines partitions on E, i.e. {S™} and {A^™}, can 
be associated with two sets of design matrices X"^ = {X'^* }"=o m=i ^^"^ = 
{^^^}Zo!S=i '^bere X^J",X^''" G R^^ and X^^" can be uniquely identified 
such that 

otherwise 



and 



if {vjt 



e NT 



] otherwise 
We re-define the set of design matrices as: 

X = {X^X^} = {Xi,.. 



AT" 



-- ns + njq is the total number 
of colours, 71-5 is the total number of colours for the self-self partitions and njv 
is the total number of colours for the network partitions. 



where X^ = ^r^i X^", Xf = Y^Zi X 



3.1 Structured Gaussian graphical models for conditional 
concentration matrix 

The design matrix X can be used to induce the following parametrization on 
0, i.e. 



= Ex 



(m)/ 



m=l 
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Table 4: Estimated conditional covariance based on 44 replicates for 4 genes 
measured across 2 time points. The number of parameter estimated is 3, that 
is the number of colour in the graph is 3. Note that the following model So -< 
Fi,No -< Fi,Si -< Fi with the rest Si,Ni ^ has been imposed. 



Time 




1 








2 






Gene 


ZNF 


CCN 


SIV 


SCY 


ZNF 


CCN 


SIV 


SCY 


ZNF 


1.11 


0.01 


0.01 


0.01 


-0.42 











CCN 


0.01 


1.11 


0.01 


0.01 





-0.42 








SIV 


0.01 


0.01 


1.11 


0.01 








-0.42 





SCY 


0.01 


0.01 


0.01 


1.11 











-0.42 


ZNF 


-0.42 











1.11 


0.01 


0.01 


0.01 


CCN 





-0.42 








0.01 


1.11 


0.01 


0.01 


SIV 








-0.42 





0.01 


0.01 


1.11 


0.01 


SCY 











-0.42 


0.01 


0.01 


0.01 


1.11 



where = {0m)^=i is a vector of unknown parameters. Because are in- 
duced by the graph the specific elements of the concentration matrix which 
correspond to edges with the same colours are constraint to be equal. Note that 
the restrictions so defined are linear in the concentration matrix. 

Example 1: Human T-cell. Consider the following factorial graphical model 
for human T-cell microarray data (see Section [2] for a description): 

[So^Fi,No^Fi,Si^Fi,Ni^Ol 

then the design matrices X — {X^, X^} ~ {Xi, X2, X3} are: 

X^o x^J = f " M x< = f ^ M x^^ = f ° " 

where D is a square matrix with 1 off-diagonal and on the diagonal. Note 
that X^i is an empty matrix since Si is an empty set which implies colours. 
Table |4] shows the estimated concentration matrix based on 44 replicates for 4 
genes measure across 2 time points. The model induces ng = 3 colours. For 



the estimation procedure see iHoisgaard and LauritzenI (|2008l ). For a faster and 



more general estimation procedure see the algorithm in subsection l4.1l 

3.2 Structured Gaussian graphical models for conditional 
correlation matrix 

It is generally important that all variables are on comparable scales if a struc- 
tured model on the concentration matrix is considered so that conclusions are 
interpretable. In contrast, model based on the conditional correlation matrix 
have proprieties of invariance under rescaling as showed in Lemma 13.11 
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Lemma 3.1 (Invariance) The inverse variance matrix & is not invariant under 
reseating Y &?/ D. Let Ti be a diagonal matrix with diagonal entries the scaled 
precision matrix is invariant. 

Proof 3.1 The inverse variance matrix 0* of^* =^ DY is given by 
0* = (DSD)^ = D^S^D^ = D^0D \ 

so 

= var{Y)-^ ^ var{Y*)-^ ^ 0*, 

so is not invariant under rescaling ofY. The conditional correlation matrix 
J7 is invariant under rescaling, i.e ft = ft* , since 

n* = So"^0*So"^ = So"^D"^0D"iSo"^So So"^0So"^ = n, 
and So = SoD-i. 

A factorial graphical model with structured conditional correlation matrix is 
obtained by restricting elements of such that: 

• all diagonal elements of O (inverse partial variances) must be identical, 
and 

• all partial correlations corresponding to edges in the same colour class 
must be identical. 

Let = SqS^So be the concentration matrix, where Sq = X]m=i X'^™6'™, 
and 

nr — 1 Si nr — 1 

i—l m—1 i—Q l—ni+1 

A structured graphical model for the conditional correlation matrix is obtained 
by partitioning ft, i.e.: 

nc 

Elements of the conditional correlation matrix $7 are related to elements of the 
natural partitions Si,Ni. Note that edges with the same colours correspond to 
specific elements of the conditional correlation matrix which are constrained to 
be equal. Restrictions are non linear in the conditional correlation matrix and 
an iterative algorithm is considered in subsection 14.11 to estimate such that 
specific elements in the conditional correlation matrix are constrained to be 
equal. 
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Table 5: Estimated conditional correlations (upper triangular) and conditional 
covariance (lower triangular and diagonal). The number of parameter estimated 
is 8 for the diagonal elements (natural partitions 6*0), 6 for the natural partition 
A'o and 4 for the natural partition Si . The total number of estimated parameter 
is 16, that is the number of colours in the graph is 16. Note that the following 



model 5*0 


^ Frr, No 




1 -< Fi 


with the rest Si 


Ni ^ has been 


imposed. 




Time 






1 








2 








Subject 


Contr 


Auton 


Infl 


Prox 


Contr 


Auton 


Infl 


Prox 




Contr 


1.32 


0.13 


0.21 


-0.14 


0.34 











1 


Auton 


-0.15 


1.08 


-0.07 


-0.07 





0.34 








Infl 


-0.26 


0.08 


1.18 


-0.07 








0.34 







Prox 


0.17 


0.08 


0.08 


1.13 











0.34 




Contr 


-0.43 











1.22 


0.13 


0.21 


-0.14 


2 


Auton 





-0.37 








-0.15 


1.12 


-0.07 


-0.07 


Infl 








-0.40 





-0.25 


0.08 


1.18 


-0.07 




Prox 











-0.38 


0.16 


0.08 


0.08 


1.14 



Example 2: Educational study. Consider the following factorial graphical 
model for educational study dataset (see Section [2] for the description): 

[So -< Ftt, No -< Fr, 5i ^ i^^i , A^i ^ 0] . 

Table 2] shows the estimated conditional concentration elements (upper triangu- 
lar) and conditional covariance elements (lower triangular and diagonal). Note 
that conditional correlation elements are constraint to be equal while concen- 
tration elements are different. 

4 Penalized maximum likelihood 

We have reached some important results by considering constraints on the con- 
centration (or conditional correlation) matrix. The number of parameters to be 
estimated can be considerable reduced. Each sub-network can be interpreted 
as its corresponding natural partition. However, dynamic genetic graphs are 
usually sparse which means that few vertices will be connected. We have given 
the concepts of sparse and dense graphs in Section |3] when the number of nodes 
tends to infinity. A measure of the density of a graph in the finite case can be 
defined as follow. For undirected graphs, density is: 

2|rT| 
^~ |rT| (|rT| - 1)' 

The maximum number of edges is ^|rr|(|rT| — 1), so the maximal density is 1 
(for complete graphs) and the minimal density is 0. 
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We could think to estimate a complete graph and to produce multiple hy- 
pothesis testing on the edges of the graph. However, model selection and pa- 
rameter estimation would be done s epara tely in this case and it would bring 
at instability of the model [Breiman ( 19961 ). Alternatively, we consider ^i-norm 
penalty on the concentration (or conditional correlation) matrix to induce spar- 
sity. The choice of a ^i-norm can be considered the unique one since other ip- 
norm, where p is typically in the range [0, 2], are not suitable in high-dimensional 
data analysis. The estimates being exactly zero for p < 1 only, while the opti- 
mization problem is convex for p> 1. Hence £i-norm occupies a unique position, 
as fc = 1 is the only value of k for which variable selection takes place while 
the opti mization problem is still convex and hence feasible for high-dimensional 
problems iBaneriee et al. 

I (I2OO8). 



4.1 Penalized likelihood for coloured graphs 

Consider a coloured graph that specifies a partition of 0, then a set of design 
matrices X is used to produce linear operators which are necessary to impose 
linear restrictions on 0. Every design matrix X''"^ m = 1,. . . ,nc consists of 
zeroes and ones and induce p — I linear constraints on when the number of 
1 in X^™^ is p. We define a linear map A = (Ai, . . . , A„p) where each X^™) 
induces a set of matrices Ai, . . . , A„^ and an element in A^*^ i = 1, . . . , rip 
assumes value —1, or 1 as described below: 

f 1 if E,, Ets before jt,gs=p-l 

^3%gs = S -1 if Y.jg Y.ts '^jlgs beforc jt, gs = p 
I otherwise, 

where 'before' is meant with respect to the total row-major ordering of the 
matrices. Now let Hp be the total number of linear constraints, then A = 
{Al, . . . , A„p} and the linear map is expressed as A : Mgym ~^ ^^'^ with: 

A(0)-[(Ai,0),...,(A„^,0)] 

where (,) is the Frobenius inner product. 

When we derive the objective function we want to take into account the 
sparsity assumption, i.e. the sum of the absolute values of the precision matrix 
should be less than p. The €i-norm constraint ||0|| < p can be written as a set 
of linear equality constraints by introducing slack variables x+,x- eR'', i.e. 

(0) := argmin{-;(0) + Ax+ + Ax"} (3) 

subject to B(0) - x+ + x" = 

h 0,x+,x" > 0. 

where B^, for i = 1, . . . fc, are symmetric matrices with just element bg^t = i>t^g = 
1 and the rest equal to zero, and k = |rT|(|rr| + l)/2 or fc = |rT|(|rr| - i)/2 
if diagonal elements are penalized or not, respectively. 
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Now we want to include the set of linear constraints A(0) which are derived 
by imposing a specific coloured graph. To achieve sparse graph structures with 
a structured a priori graph one can minimize a convex log-likelihood function, 
i.e.: 

(0) := argmin{-/(0) + Ax+ + Ax"} (4) 



subject to ^(0) = 

B(0) x+ + x = 
h 0,x+,x" > 0. 

We have re-written the convex optimization problem in the standard form. This 
is a quadratic semi-definite log-determinant programming problem which allows 
to impose factorial graphical models. 

The non linearity of the objective function and the positive definiteness 
of the constraint make the optimization problem Q not trivial. We use an 
algorithm called LogdetPPA to find a solution of LogdetPPA employs the 
essential ideas of the proximal point algorith m (PPA), the Newton method and 
the preconditioned conjugate gradient solver ( Wang et al. . 2009f) . 



Note that LogdetPPA algorithm was developed into Matlab and it gives 
the opportunity to solve convex optimization problems with linear constraints 
which need to be implemented. We implemented linear constraints for factorial 
graphical models as described in Section |31 Moreover, it is possible to use 
function of Matlab within R. In fact, the package R. Matlab allows to connect 
Matlab and R. R is more suitable for statistical analysis and it is an open 
source software so source codes of packages are available. We took advantage 
from R. Matlab to create a virtual connection between R and Matlab so that we 
are able to solve the constraint optimization problem within R. 



LogdetPPA optimization of FGL concentration model. For FGLe prob- 
lems, we can apply the algorithm logdetPPA after having create the linear 
operators A and B. Then, is the matrix that minimizes the penalized log- 
likelihood (in among the space of all symmetric FT x FT matrices for whom the 
non linear restriction on holds. For example we used FGLe to estimate the 
matrix represented in Tabled 



LogdetPPA optimization of FGL conditional correlation model. This 
algorithm allows to introduce linear constraint but in case we are modelling 
conditional correlations the constraints are not linear. However, we overcome 
this problem by using an iterative algorithm which make use of logdetPPA after 
having found an initial guess for diag(0). The pseudo-code is described in 
Algorithm[TJ Usually, the convergence is reached after few iterations (4-10). By 
taking a starting point Sq the optimization problem (j4]) is a convex optimization 
problem, i.e., the objective function is convex on 0, and the feasible region is 
convex. 
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Algorithm 1 Calculate sparse © with structured on ft 



Require: Coloured graph Si ^ Ni ^ * and set an initial vector Sq- 

1. Find the linear maps Ai, . . . , A^- 

2. fc = 0,...,. 

3. Estimate @^''\ 

4. Set So =diag(0('=^). 

5. Replay Sq'^' with and estimate 

6. If 11©^'+'^- 0^"^ 111 <e Stop 
else fc = fc + 1, go to 3 

end. 



5 Model selection 

We have seen that estimation of considering different coloured graphs given 
a smoothing parameter A is possible inside a convex optimization framework. 
We have also proposed several factorial graphical models. In this subsection 
we address same issues on how to choose the 'best' coloured graph, and what 
should be a g ood compromise between a sp a rse an d a dense graph. In particular, 
according to iMeinshausen and Biihlmann ( 20101 ) we want to find a smoothing 



parameter such that the expected number of false positive links is taken under 
control. This aim can be reached through stability selection. Stability selection 
is similar to bootstrap idea which consists of a re-sampling procedure. For 
example we used FGLo to estimate the matrix represented in Table [5] 



5.1 Classical approaches 

Let's assume that the smoothing parameter is fixed (point-wise control) A = Aopt 
and two coloured graphs need to be compared: 

[So -< FrT, No -< Fr, Si^Fi^Ni^O], 

and 

[So -< FvT, No -< Fr, Si -< Fr, N^^O]. 

An information criterion such as AIC, BIC and AICc can be used to compare dif- 
ferent factorial graphical models. AIC typically will select more and more com- 
plex models as the sample size increases, because the maximum log-likelihood 
increases linearly with n while the penalty term for complexity is proportional to 
the number of degrees of freedom. Note that for factorial graphical models the 
number of degrees of freedom is approximated by the number of " free" parame- 
ters different from zero, i.e. the estimated elements different from zero which do 
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not belong to the same partition. AICc penalizes complexity more strongly than 
AIC, with less chance of over- fitting the model. BIC is co nstructed in a manner 



quite similar to AIC with stronger penalty for complexity (jClaeskens and Hiort 



20081) . 



5.2 Stability selection 

Usually, the choice of smoothing parameter A is crucial as it leads to the struc- 
ture of the network. In particular, if A = and there are no constraints, 
the maximum likelihood estimator is = S~^, if A — > oo, is diagonal 
which means all random variables are independent. Generalized cross vali- 



dation can be used to select the tuning parameter A — Act, but iLeng et 



(l200fih showed that given Acu the estimator of is not consistent in terms 



of variables selection. Alternatively, bootstrap ( Breiman . 19991 ) can be con 



sidered to estimate empirical distribution of each ele ment of 0. We prefer 
stability selection ( Meinshausen and Biihlmann . 2010 f) because the expected 



number of links falsely estimated is controlled and variables selection is consis- 
tent. Moreover, the choice of a smo othing parameter A becomes less important 
Meinshausen and Biihlmann ( 2010l ). We adapt stability selection to factorial 



graphical models. 

Let's consider a vector of smoothing parameters such that A e A C M+ that 
determines the amount of regularization. 

Theorem 5.1 A necessary and sufficient condition for O ii_M = for all fc € F 



and 7, 1 £ T is that \Sij_M\ < A for all i ^ k and I ^ j \Mazumder and Hastit 
i '20lA} . 



Upper and lower bounds of A, u\ and Ix respectively, are calculated such that 
u\ = max{\Sij^ki\) and l\ = min{\Sij^ki\) ioi i ^ k I ^ j, where S F and 
J, I G T . Then, for all A > ua an empty graph is estimated while for \ < l\ a. 
fully connected graph is estimated. We search the solution of the optimization 
problem (jH) for value of A into the range [l\ , ux] . The "optimal" value of A = \opt 
can be chosen by minimizing a score which measures the goodness-of-fit. 
Let graph G = (V, E) be inferred, where 

is the estimated edge set and 

denotes the active set. Let / — {1, . . . , n} be the index set for sample y^*), i € /, 
then: The set of stable edges is indicated as 

Estable = {{Vi,Vj) ■ ^ij > TTt/ir}, 

and it depends on Xopt via nf/"'. The tuning parameter nthr indicates a thresh- 
old and controls the expected number of falsely selected links. Assume that 
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Algorithm 2 Stability selection for graphical models. 
Require: n. 

• Draw sub-samples of size [n/2] without replacement, denoted by /* C 
{l,...,n}, where |/*| = [n/2]. 

• Run the selection algorithm Ex^^^ {I* ) on /* . 

• Do these steps many times and compute the relative frequencies, 



X = 0.46 ?. = 0.448 X = 0.436 X = 0.424 




Figure 2: Graph selection with cross validation and stability selection procedure. 



the joint distribution of the random variables is exchangeable and £' is a better 
choice than a random guessing, then it can be shown that 

1 

E{FP) < ^, 

27rthr -Ik 

where k is the dimension of the model (it depends on the factorial model), 
q is the number of selected variables (e.g. \E\), and FP = lE"^ n Estabie] is 
the number of falsely positive selected. This is a finite sample control, even if 
k » N. Choose E{FP) < v, then if < yk: 

TTthr = {l+q^/vk)/2, 

and TTthr & (51 1) is bounded. 

Figure [5] is taken from Meinshausen and Biihlmann ( 2010l) and it illustrates 
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Table 6: Simulation study scheme in which four scenarios are represented. The 
first column is an identification number, the second one indicates the number 
of variables per each time point (third column). The number of independent 
samples are represented in the last column. 



ID 


g g* 


t 


p n 


1 


20 


3 


60 50 


2 


- 20 




120 - 


3 


- 40 




180 - 


4 


- 60 




240 - 



that the choice of a tuning parameter A is less important with stability selection 
than cross validation. 

6 Simulation study 

We considered a simulation study to show the performance of the proposed 
model. Table |6] shows the simulation study scheme in which four different sce- 
narios are studied. Here for different scenarios we mean that the number of 
nodes, links or time points change while the structure of the networks is the 
same. 

We are mainly interested in the performance of our estimator in terms of false 
positive (FP), false negative (FN), true positive (TP) and true negative (TN). 
Let be the true and be the estimated precision matrix. These measure 
summarize: 

• the percentage of links that are falsely estimated, i.e. an element in is 
zero but is not zero in (FP) and an element in is not zero but is zero 
in (FN), 

• the percentage of links that are positively estimated, i.e. an element in 
is not zero and it is not zero in (TP) and an element in is zero and 
it is zero in (TN). 

Other measure like these are the false discovery and false not discovered. 

Whereas, we are not looking at the performance of our estimator in terms 
of distances between the "true" parameter and the estimated ones 0. An- 
other element of interest is the sign of the estimated conditional covariance. In 
fact, this element is not zero, if —sign{6ij) is positive this indicates a positive 
dependence whereas if —sign{9ij) is negative it indicates a negative dependence. 

For each scenario we simulate 100 datasets from a multivariate normal dis- 
tribution with /X equal to zero and S equal to the inverse of a precision matrices 
where the coloured graph is simulated from the following model: 

[So ^ FrT,No ^ Fr,Si ^ Fi], 
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Table 7: The average of the proportions of how raany links have been correctly 
estimated were calculated by the False Positive (FP), False Negative (FN), False 
Discovery (FD) and False not Discovery (FnD). 





FP FN FD FnD 


AICc 
1 BIC 
AIC 


0.0092 0.0811 0.2000 0.0031 
0.0363 0.0139 0.4873 0.0005 
0.0698 0.0069 0.6470 0.0003 


AICc 
2 BIC 
AIC 


0.0057 0.0447 0.2899 0.0006 
0.0088 0.0321 0.3826 0.0005 
0.0437 0.0041 0.7514 0.0001 


AICc 
3 BIC 
AIC 


0.0016 0.4585 0.2730 0.0036 
0.0016 0.4585 0.2730 0.0036 
0.0288 0.1452 0.8088 0.0012 


AICc 
4 BIC 
AIC 


0.0091 0.1034 0.1680 0.0052 
0.0396 0.0517 0.4527 0.0027 
0.0670 0.0000 0.5704 0.0000 



Table 8: Average performance for neighbourhood selection model, graphical 
lasso and structured graphical lasso. 





FP FN FD FnD 


1 glasso 
FGL 


0.002 0.977 0.667 0.035 
0.006 0.137 


2 glasso 
FGL 


0.001 0.964 0.620 0.013 
0.005 0.263 


3 glasso 
FGL 


0.001 0.988 0.906 0.008 
0.001 0.458 0.273 0.004 



while the rest is zero. Note that we keep the true network constant but we 
increase the number of nodes in the graph. Random variables associated with 
these added nodes are independent. We keep the number of replicates and 
time points constants. The number of replicates is fewer than the number 
of random variables. Tabic [7] shows the average over 100 of measures that 
is FP = J^lTi^^i' = ^'^d so on. Moreover we show that 

AICc performs b etter on average and other graphical lasso such us proposed by 
Tibshiranil (1996) does not perform well in case of structured dynamic graphical 



models (see Table [8]). 



7 FGL data analysis 

We have seen that several factorially coloured graphs can be imposed on a 
graphical model and a model selection procedure is necessary to select the "best" 



22 



Table 9: Model ordering according to AICc for T-cell dataset 











Model 










AIC 


AICc 




Fi 


No- 




Si- 




Nl r- 


■^Fr 




J Prp 


175.82 


178.19 


So-- 


Fi 


No 


- 


Si - 


^Fr 


Nl r. 


^Fr 


S2' 


- 


175.84 


178.42 


So ^ 


Fi 


TVo- 


Ftt 


Si r 


^Fr 


Ni ^ 


Ftt 


S2' 


- 


176.32 


178.60 


So ~ 


Fi 


No- 


.Fr 


Si ^ 


.Fi 


Ni 


- 


S2' 


- 


176.78 


178.93 


So- 


' 1 


No- 




Si - 




Nir. 


^Ft 


S2' 


- 


177.04 


179.19 


So - 


^ 1 




Ftt 


Si ^ 




Ni ^ 


Ftt 


S2' 


- 


176.81 


179.81 


Tlnjil 






















■[■inics 
























. , - .'1- . 





Figure 3: Selected model after model selection for T-cell dataset. 



dynamic graph. Moreover, a smoothing parameter A that regulates the sparsity 
needs to be selected. 

Let's consider several coloured graphs for Human T-cell dataset. Information 
criterion measures, for each of the top ten models, are showed in Table [SI We 
select the first coloured graph since it has the smallest AICc. This model is 
described in Figure |3l According to the coloured graph this scheme summarize 
the characteristics of the estimated graph. The networks at temporal lag are 
constrained to be equal across the five observed time points. Moreover, the 
networks at temporal lag 1 are constrained to be equal across time. There are 
no links between time t and i-|-2 since we are assuming conditional independence 
between time t and i -I- 2 except for self interactions, i.e. interactions between 
the same couple of genes. Figure |4] shows interactions between genes at lag 
(left part of the figure), and it shows interactions between genes at lag 1 (right 
part of the figure). 



8 Discussion 



As more and more large datasets become available, the need for efficient tools to 
analyze such data has become imperative. In this chapter, we have considered 
sparse dynamic Gaussian graphical models with £i-norm penalty. This type of 
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Figure 4: Representation of interactions between genes at temporal lag 0. Note 
that networks at lag at time 1,2, 3,..., 5 are equal since we impose Nq ~ 
-Fr(top). Representation of interaction between genes at temporal lag 1. Note 
that networks at lag 1 between time (1, 2), (2, 3), (3, 4), (4, 5) are equal since we 
impose A^i ~ Fr (bottom). 24 



modelling offers a straightforward interpretation, i.e. the edges of the graph 
defining the partial conditional correlations among the nodes. In particular, 
under the sparsity assumption, a large part of the precision matrix can be filled 
with zeros a priori. Based on the consideration of dynamic and model-oriented 
definitions, we are able to reduce the number of parameters to be estimated. 
We have shown that FGLe proved to be powerful on both simulated and real 
data analysis. 
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